NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions

Agostinelli, Victor; Hong, Sanghyun; Chen, Lizhong (July 2024, International Conference on Machine Learning (ICML))

A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it difficult or impossible to apply them to autoregressive and simultaneous tasks, where the target and sometimes even the input sequence length are unknown. To address this issue, we propose Learned Proportions (LeaP) and LeaPformers. Our contribution is built on two major components. First, we generalize the dependence on explicit positional representations and sequence lengths into dependence on sequence proportions for re-weighting. Second, we replace static positional representations with dynamic proportions derived via a compact module, enabling more flexible attention concentration patterns. We evaluate LeaPformer against eight representative efficient transformers on the Long-Range Arena benchmark, where we show that LeaPformer achieves the best quality-throughput trade-off, as well as apply LeaPformer to Wikitext-103b autoregressive language modeling and simultaneous speech-to-text translation for two language pairs, achieving competitive results in both tasks.
more » « less
Full Text Available
Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

Agostinelli, Victor; Wild, Max; Raffel, Matthew; Fuad, Kazi; Chen, Lizhong (August 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL))

Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Neural machine translation (NMT) is one such task that LLMs have been applied to with great success. However, little research has focused on applying LLMs to the more difficult subset of NMT called simultaneous translation (SimulMT), where translation begins before the entire source context is available to the model. In this paper, we address key challenges facing LLMs fine-tuned for SimulMT, validate classical SimulMT concepts and practices in the context of LLMs, explore adapting LLMs that are fine-tuned for NMT to the task of SimulMT, and introduce Simul-LLM, the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.
more » « less
Full Text Available
Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

https://doi.org/10.18653/v1/2024.emnlp-main.1017

Raffel, Matthew; Agostinelli, Victor; Chen, Lizhong (January 2024, Association for Computational Linguistics)

Full Text Available
Improving Autoregressive NLP Tasks via Modular Linearized Attention

https://doi.org/10.1007/978-3-031-43421-1_6

Agostinelli, Victor; Chen, Lizhong (January 2023, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD))

Full Text Available

Search for: All records